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(57) Abstract 



A system for optimizing the order of read/write commands in a disk file system is described, An optimizing scheduler blocks the 
.transmission of disk access requests from an application program to a disk controller until a relatively large set of read/write requests is 
collected. The scheduler then sorts the set of read/write requests into an order which corresponds to the physical distribution of sectors on 
the disk accessed by the set of read/write requests. 
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OPTIMIZING SCHEDULER FOR READ/WRITE 
OPERATIONS IN A DISK HLE SYSm4 

A portion of the disclosure of this patent document contains 
material which is subject to copyright protection. The copyright owner 
has no objection to the facsimUe reproduction by anyone of the patent 
disclosure, as it appears in the Patent and Trademark Office patent files 
or records, but otherwise reserves all copyright rights whatsoever. 

FIELD OF TTfV INWMTTOM 

The present invention relates generally to the field of computer . 
operating systems, and more particularly to scheduling read/write 
requests in a disk-based file system. 

BACKGROU ND OF TfTF TNVENTTnN 

Disk based operating systems divide a disk into physical sectors 
which represent the fundamental units for data storage on the disk. A 
sector is a portion of the circumference of a track on the disk, and the 
read/write mechanism within the disk drive can access specific sectors 
on the disk directly. When an application issues a command to read a 
specific byte ftrom a disk file, the file system locates the correct surface, 
track, and sector, reads the entire sector into a memory buffer, and then 
locates the requested byte within that buffer. Because sector sizes 
allocated on a disk may be small in relation to the total amount of data 
required in a typical data transaction, the read /write head may be forced 
to seek among several different sectors in response to a read or write 
command by an application program. 

The physical read/write mechanism of a hard drive is generally 
in one of three states: idle, seek, or input/output (I/O). In order to 
maximize the I/O bandwidth of the disk drive, the time that the drive 
spends in the idle and seek states must be minimized. A seek state 
corresponds to moving the read/write mechanism to the correct sector, 
and the time required by a seek operation is referred to as the seek time'. 
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One method used in present disk drive systems of minimizing 
idle times is to ensure that new read /write requests are available as flie 
previous read /write operations complete. This essentially creates a 
queue of read /write requests which minimizes the time gap (idle time) 
between individual read/write operations. Likewise, a method used in 
present disk drive systems of minimizing seek times is to analyze the 
queued read/write requests and perform them in the most efficient 
order with regard to the order of the sectors as they are accessed on the 
disk. A shortcoming of these present systems is that because they 
attempt to minimize the latency between the time a request is initiated 
and the time the operation is performed, the number of requests that 
are queued at any given time is also minimized. The minimal queue 
length consequently limits the number of requests which can be 
optimally ordered, thus resulting in a less than optimal ordering in 
cases where the number of requests exceeds the size of the queue. 

Many general purpose file systems are suitable for use with 
applications for which minimal optimization of read/ write requests 
occurs. Certain applications, however, require sets of data transfer 
operations to disk to be completed within a maximum time limit. This 
time requirement translates to a minimum disk input/ output 
bandwidth, and is typically measured in terms of number of bytes 
transferred to or from the disk per unit time. For such applications, 
excessive seek times may increase disk access times the beyond the 
minimum bandwidth requirements. Examples of I/O intensive 
applications include real-time applications which require extensive 
disk access. Although a common solution to this problem may be to 
use faster disk drives, design and cost constraints may prevent the use 
of adequately fast disk drives. 

It is therefore an intended advantage of the present invention to 
provide a system for optimizing a large number of disk access requests 
in a disk file system for use with a broad range of disk drive devices. 
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SUMMARY OF THE INVF^KFTTOM 

The present invention discloses a method for increasing the disk 
I/O bandwidth in a disk based file system. Disk access requests from an 
application program to a drive controller are blocked until a relatively 
large set of read /write requests are collected. An optimizing scheduler 
sorts the order of the requests so that the order of the set of read /write 
requests corresponds to the physical distribution of sectors on a disk 
which are to be accessed by each read /write request. The ordered set of 
read/write requests are then transmitted to the drive controller. 

According to one embodiment of the present invention the disk 
input/output queue, is blocked from being processed for a fixed period 
of time while read /write requests accumulate in a buffer. The 
optimizing scheduler then optimizes the entire queue in a single pass 
and sends the operations to the drive controller in the optimized order. 

An embodiment of the present invention effectively increases 
the size of the queue of read/write requests to the disk drive, so that 
new requests inserted into the queue have a greater chance of being 
optimally placed between requests already in the queue, thus 
improving the average seek time for the set of requests. 

Other features of the present invention will be apparent from 
the accompanying drawings and from the detailed description which 
follows. 

BRIEF DESCRIPTION OP THE DRAWTNnq 

The present invention is illustrated by way of example and not 
limitation in the figures of the accompanying drav^ings, in which like 
references indicate similar elements, and in which: 

Figure 1 is a block diagram of a computer system which may be 
used to implement an embodiment of the present invention. 
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Figure 2 illustrates an example of sector distribution on disk 
media. 

Figure 3 is a block diagram of a disk file system which uses- an 
optimizing command scheduler according to one embodiment of the 
present invention. 

Figure 4 is a flowchart illustrating the process of re-ordering sets 
of read /write requests to optimize disk drive performance according to 
one embodiment of the present invention. 

Figure 5 is a table which illustrates the relative seek distances 
required in different command queuing systems, including a system 
according to one embodiment of the present invention. 

DETAILED DESCRTPTTOKT 

A system for collecting and optimizing read /write commands in 
a disk file system is described. In the following description, for 
purposes of explanation, numerous specific details are set forth in order 
to provide a thorough understanding of the present invention. It will 
be eviderit, however, to one skilled in the art that the present 
invention may be practiced without these specific details: In other 
instances, well-known structures and devices are shown in block 
diagram form to facilitate explanation. 

Hardware O verview 

Figure 1 illustrates a block diagram of a computer which may be 
used to implement an embodiment of the present invention. The 
computer system 100 includes a processor 102 coupled through a bus 
101 to a random access memory (RAM) 104, a read only memory (ROM) 
106, and display device 120. Keyboard 121 and cursor control imit 122 
are coupled to bus 101 for commimicating information and command 
selections to processor 102. Also coupled to processor 102 through bus 
101 is an input/output (I/O) interface 123 which can be used to control 
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and transfer data to electronic devices connected to computer 100. 

Mass storage device 107 is also coupled to processor 102 through 
bus 101. Mass storage device 107 represents a memory device which 
stores data accessible from processor 102 through a sector-based file 
system used by the computer system 100, Mass storage device 107 may 
be a persistent storage device, such as a floppy disk drive or a fixed disk 
drive (e.g., magnetic, optical, magneto-optical, or the like), which can 
directly access locations on the disk for reading and writing data to and 
from the disk media. Alternatively, mass storage device 107 could be a 
tape drive which accesses data blocks placed sequentially on streaming 
tape media. It should be noted that the architecture of Figure 1 is 
provided only for purposes of illustration, and that a computer used in 
conjunction with the present invention is not limited to this specific 
architecture. 

Command Optimization 

Disk file systems organize data on disks in sectors. Due to file 
fragmentation or sector interleaving, data relating. to a particular file 
may be spread among several discontiguous sectors. Figure 2 illustrates 
an example of the distribution of related sectors on a disk file. Disk 200 
contains several tracks and four sectors containing data. The sectors are 
numbered 1, 2, 3, and 4; If an application requests to read to data from 
these sectors, a file system which employs a simple command FIFO 
(first-in, first-out buffer) would simply access the sectors in the order 
the commands are received. Thus, with reference to Figure 2, if the 
application requested data to be read from sector 1, sector 2, sector 3, and 
then sector 4, the read /write head of the disk drive would access the 
sectors in that order. As can be seen in Figure 2, however, this order 
. requires several seek operations between each read operation since each 
sector is on a different track, and the sectors are not arranged on disk in 
the order in which they are requested. 

Certain known disk file systems use a command queuing system 
to minimize the number of seeks between sector read operations by 
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ordering the sequence of the sectors as they are accessed by the 
application to correspond to the physical order of the sectors on the 
disk. For example, given the sector distribution illustrated in Figure 3, 
an optimal order to the read/ write requests would be 1, 4, 3, 2, As can 
be seen in Figure 2, this order eliminates any extra track crossings 
between sequential sector accesses. 

Present systems of efficiently ordering disk read/write requests, 
such as command queuing systems, however, provide only a limited 
queue size in which commands are re-ordered. Queue size is limited 
because these systems also attempt to minimize the read /write cycle 
latency which is the measure of the amount of time between the time 
that a read/ write request is initiated and the time that it is performed. 
As the length of the queue is increased, the latency correspondingly 
increases. Thus, in the example provided above, if the queue can only 
accommodate two instructions while an access command is being 
completed, the system would cause the sectors to be read in the order of 
1-3-2-4. Given the distribution of the sectors in Figure 2, this re-ordered 
scheme does not reduce the number of seek operations from the 
original order of 1-2-3-4. Although Figure 2 illustrates an example in 
which only four sectors are accessed, it will be appreciated that most 
applications request disk accesses which involve many sectors. In these 
cases, the limitations of the simple FIFO and command queuing 
systems are amplified. 

Figure 3 illustrates a system according to one embodiment of the 
present invention'in which the effective size of the read /write 
command queue is increased so that more read /write commands are 
inserted in the queue to be placed in the optimum order. In Figure 3, 
application 302 generates read /write requests for data on a disk 
controlled by drive controller 306. Drive controller 306 represents both 
the controller hardware circuit within the disk drive, as well as the 
driver firmware executed by the disk drive. Drive controller 306 sends 
control signals which control the movement and read /write 
functionality of head mechanism 308. 
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In one embodiment of the present invention, an optimizing 
scheduler 304 receives the read/write requests from application 302. 
Optimizing scheduler 304 collects a number of read /write requests, and 
then orders these requests so that.seek operations between the accessed 
sectors is minimized. 

In a method of the present invention, the optimizing scheduler 
304 blocks the queue and holds the read/write requests in a buffer for a 
specified period of time. This period of time defines a scheduler period 
which can be varied depending on factors such as the rate of requests 
issued by the application, the I/O bandwidth requirements of the 
system, and the amount of fragmentation on the disk. When the end 
of the scheduler period is reached, the optimizing scheduler optimizes 
the entire queue in a single pass and sends the requests to the disk in 
the optimized order. 

Alternatively, instead of a period of time representing the 
threshold condition for releasing the requests, the optimizing 
scheduler could be configured such that a specified number of 
accumulated requests serves as the threshold condition. For example, 
the optimizing scheduler could accumulate 100 requests before 
optimizing the requests. For this embodiment, optimizing scheduler 
304 includes a counter which maintains a count of the number of 
read /write requests received from application 302. 

According to one embodiment of the present invention, the 
optimizing scheduler 204 sorts the read/write requests so that the order 
of the requests corresponds to the order in which the sectors to be read 
are sequentially distributed on the disk. Thus, with reference to Figure 
2, the optimum order of the sectors is 1-4-3-2, since these sectors are 
contained in sequential tracks from outermost to innermost track in 
this particular order. Furthermore, the sorting direction between sets 
of read/write requests is exchanged so that the drive mechanism 
sweeps across the disk in one direction first, then back in the other 
direction. This eliminates the need for the drive mechanism to move 
back to the opposite side of the disk after the innermost or outermost 
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track of the disk has been reached. 

Figure 5 provides a table which summarizes the access request 
order and resulting number of seeks for the simple FIFO and two- 
command queue system described above, in comparison with the 
results obtained for a scheduler according to an embodiment of the 
present invention. The sector numbers provided in the Request Order 
column of Figure 5 correspond to the sector numbers illustrated in 
Figure 2. 

With regard to the method in which the optimizing scheduler 
304 sorts the order of the read /write requests, a simple sorting 
algorithm is implemented in which the sort operation is performed on 
all collected requests in one pass. Alternatively, the optimizing 
scheduler implements an incremental sorting algorithm in which 
requests are placed into their optimal order as they are received. 

In a further alternative of the present invention, the read/ write 
operations are prioritized, so that background operations can be 
performed without impacting the real-time performance of the system. 
As the distribution of the read /write operations approaches worst case, 
background read/write requests can be deferred to a later set of requests 
so that higher priority read /write operations are completed in time. 
For better case sets of read / write operations, background read/ write 
requests could be interleaved (and still seek-optimized) without 
causing real-time problems. 

Figure 4 is a flowchart illustrating the major steps of optimizing 
the execution order of disk read /write commands according to a 
method for one embodiment of the present invention, and with 
reference to the file system illustrated in Figure 3, In step 402, the 
optimizing scheduler 304 collects disk read/write requests sent from the 
application program 302 for one scheduler period. The scheduler 
period corresponds to the number of read/write requests which are 
held in the read/write queue before the order of the read/write requests 
is optimized. After the specified number of read /write requests are 
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coUected in the optimizing scheduler buffer, the read /write requests are 
re-ordered so that the distance of seek, operations between consecutive 
read /write operations is minimized, step 404. The read /write 
commands are then transmitted from the optimizing scheduler 304 to 
the drive controller 306 in the optimized order, step 406. If there are 
additional disk read/write requests issued from the application, the 
process repeats from step 402, otherwise the process ends. 

As has been mentioned earlier, increasing the command queue 
size increases the latency between the time a read /write request is 
initiated and the time the operation is performed. This is not generally 
a problem for write operations and is not a problem for read operations 
if the disk locations to be accessed are known far enough in advance (as 
is usually the case for media tracks). Moreover, a larger queue requires 
larger data buffers than conventional queuing methods, however, this 
is often an acceptable trade-off for increased disk I/O bandwidth. 

One embodiment of the present invention provides a file system 
method which is suitable for use with applications with high disk I/O 
bandwidth requirements in which excessive disk access time may 
negatively impact application performance, A method of the present 
invention provides a system for optimizing the order in which disk 
read /write commands are executed, so that disk seek operations may be 
reduced to satisfy the disk I/O bandwidth requirements of the 
application programs. 

One exemplary application of the optimizing scheduler is its use 
in a computer implemented digital player/recorder including audio 
record and playback programs which read and write multiple tracks of 
data simultaneously to the disk. Such an application has rigorous 
minimum disk I/O bandwidth requirements for playback, and the disk 
head may be forced to seek among blocks on the disk in order to play a 
single track. In some instances, the data samples may not be read from 
the disk within the required time, in which case, the audio data played 
back may be interrupted or otherwise distorted. The ordering of large 
sets of read/write requests in accordance with the physical distribution 
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of sectors on the disk, as described in reference to Figure 4, minimizes 
the chance that applications fail due to excessive seek operations 
causing increased data transfer cycles. 

Although the above discussion was written in the context of 
audio applications, it should be noted that a method of the present 
invention could be used in other similar applications, such as 
applications which involve both video and audio content, or 
applications which involve high-speed calculations on limited 
amounts of data. 

In one embodiment of the present invention, the optimizing 
scheduler 304 illustrated in Figure 3 is implemented as a program 
which is executed by a processor coupled to the disk drive which is to be 
accessed (e.g., processor 102 in computer system 100). The optimizing 
scheduler program could be a program which is incorporated as part of 
the disk file system, or it could be a stand-alone program which can be 
called by the disk file system or an application program. Appendix A ' 
provides a detailed listing of C++ program code which implements an 
optimizing scheduler according to one embodiment of the present 
invention. It will be appreciated however, that methods of the present 
invention are not limited to the programming language and exact code 
sequences provided. 

The steps of a method of the present invention may be 
implemented by a central processing unit (CPU) in a computer 
executing sequences of instructions stored in a memory. The memory 
may be a random access memory (RAM), read-only memory (ROM), a 
persistent store, such as a mass storage device, or any combination of 
these devices. Execution of the sequences of instructions causes the 
CPU to perform steps according to the present invention The 
instructions may be loaded into the memory of the computer from a 
storage device or from one or more other computer systems over a 
network connection. Consequently, execution of the instructions may 
be performed directly by the CPU. In other cases, the instructions may 
not be directly executable by the CPU. Under these circimistances, the 
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instructions may be executed by causing the CPU to execute an 
interpreter that interprets the instructions, or by causing the CPU to 
execute instructions which convert the received instructions to 
instructions which can be directly executed by the CPU. In other ■ 
embodiments; hardwired circuitry may be used in place of, or in 
combination with, software instructions to implement the present 
invention. Thus, the present invention is not limited to any specific 
combination of hardware circuitry and software, nor to any particular 
source for the instructions executed by the computer. 

In the foregoing, a system has been described for collecting and 
optimizing the order of read /write requests in a disk file system. 
Although the present invention has been described with reference to 
specific exemplary embodiments, it will be evident that various 
modifications and changes may be made to these embodiments 
without departing from the broader spirit and scope of the invention as 
set forth in the claims. Accordingly, the specification and drawings are 
to be regarded in an illustrative rather than a restrictive sense. 
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Appendix A 



// NAME qfssched.hpp 
// 

// DESCRIPTION 

// Quick version of scheduler 
// 

// COPYRIGHT 

// copyright 1994 by Advanced digital Systems Group. All righti 

reserved . 

// 

//• VERSION CONTROL 

// $Header: G : /PVCS/ARCHIVES/hchallel/QFSSCHED.HPV l,li 
AUG 14, 1997 12:01:16 JCLAAR $ 



#include <dbllnk.hpp> 
#include "task.hpp" 
#include <tiinedef.h> 
#include <sndf ile .hpp> 
#include "refcount . hpp" 

class SyloObject; 

class SyloReferenceObject : public SyReferenceCount 
( 

SyloObject* mIoObject; 
public: 

inline SyloReferenceObject (SyloObject* ioObject); 



# i f nde f _QFSSCHED_HPP 
#define_QFSSCHED_HPP. 



# include <;vector> 



#include <list> 
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-SyloRef erenceObject ( ) ; 

inline void BuildloList ( IoNodeList& ioList); 
inline Syloobject* loObjectO const; 
inline SyQFS* FileSystem (void) ; 
inline void RemoveloObject ( ) 
void AddNotifyO ; 

); 

class SyQuickScheduler : public SyTask 
{ 

SySync : : Sycri ticalSection mObj Access ; 

SySync: rSycriticalSection mReadyObj Access ; 

typedef STD: : vector<SyIoObject*, 
STD: : allocator, SyIoObject*>> loObjects; 

typedef STD: : list<SyioRef erenceObject* , 
STD: :allocator<SyIoReferenceObject*» ReadyloObjects ; 

loObjects mloObject; 
ReadyloObjects xnReadyloObject; 
UInt32 mPeriod; 
UInt32 mResolution; 

Bool mSortForward; // Sort direction for I/Os 
void SortReadyList (IoNodeLists& listToSort) ; 

protected: 

virtual UInt32 Iterate (Ubtl6 index): 
public: 

HCHAN.EXP SyQuickScheduler (Uint32 period, UInt32 
resolution) ; 

HCHAN_EXP -SyQuickSchedulerO; 

virtual void AddObject (SyloObject* object); 

virtual void ReraoveObject (SyloObject* object); 

Virtual, void AddReadyObject (SyloRef erenceObject* object ): 

virtual void RemoveReadyObject (SyloObject* object ) : 

inline UInt32 Period () const; 
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inline UInt32 Resolution () const; 
inline TimeCode loPeriodO const; 

#inline "qfssched. inl" 

•#endif 

// NAME qfssched. cpp 
// 

//DESCRIPTION 

// Quick version of scheduler 
// 

// COPYRIGHT. 

' // copyright 1997 by Advanced Digital Systems Group. All right 
reserved. 
// 

// VERSION CONTROL 

// $Header: G: PVCS/ ARCHIVES/hchallel /qfssched. cpv 1.37 
AUG 22, 1997 14:03:24 jclaar $ 

// 

#include "precomp.h" 
#include "qf ssched.hpp" 
#include <mutex.hpp> 
tinclude <algorithin> 

void SyloReferenceObject : lAddNotify { ) 
{ 

mIoObject->AddNotify ( ) ; 

) 

* SyloRef erenceObject : : -SyloRef erenceObject { ) 
( 

^ASSERT ( this->GetRef ernceCount ( ) == 0 ) ; 

/ /TRACE (^TCSyloRefobj -Da tret: %ld\"), this ); 
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) - . 

SyQuickScheduler : : SyQuickScheduler 

(' 

UInt32 period. 
Uint32 resolution 

) 

: SyTask 
( 

NULL 
NULL 

FALSE, 
. period, 
true, 

kEBxSchedulerPriority 

), 

mPeriod (period) , 
mResolut ion (resolution) . 
mSof tForward (TRUE) 

( 
) 

SyQuickScheduler : : -SyQuickScheduler ( ) 
( 

if ( ImReadyToObject. empty () ) 

( 

// TRACE (_T(".qsch-dstrct size; %ld\"), inReadyIo0bject->si2e { ) 
); 

for( ReadyloObjects: : iterator iter = mReadyloObject .begin () ; 
iter 1= xnReadyloObject . end () ; 
iter++ 
) 

( 

// TRACE (_T("qsch-dstrct: %ld\n" , (*iter) ); 
( *iter ) ->Decreinent ( ) : 
) 
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) 

this->kill() : 

) 

void SyQuickScheduler : :AddObject (SyloObject* object) 
( - 

^ASSERT {object 1=: NULL; 

this->Stop( ) : 

( 

GrabCriticalSection cs ( fcinObj Access ) ; 
// don't add the object if it is already in here. 
_ASSERT(STD: :find(inIoObject.begin{) . mIoObject .end( ) . 
object) ==mlo0bject .end( ) ) ; 

mloObject .push_back (object) ; 
this->SyTask: :AddObject (object- .Event ( ) ) ; 
) 

void SyquickScheduler: :RemoveObject (SyloObject* object) 
( 

_ASSERT( object 1= NULL); 

this-. Stop 0 : 

( 

GrabCriticalSection cs (&mObj Access ) ; 
loObjects :: iterator i = 

STD: :find(raIoObject.begin() ) , miobject . end( ) , object): 
_ASSERT{1 1 mIoObject.endO ) ; 
mIoObject . erase (i); 

this- . SyTask: : RexnoveObject-> Event ( ) ) : 
) 

this->Start 0 : 

) 

void SyQuickScheduler: :AddReadyObject (SyloReferenceObject* 

object) 

{ 

// 
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// this code protects against duplicate ioobjects. but 
// with the new scheme, I don't think duplicate ioobjects 
//can ever happen. This could save some time (RMD) 
// 

GrabCriticalSection cs (&mReadyObjAccess ) ; 
_ASSERT( object != NULL) ; 

// Don't add the object of it is already in here, 
if )object->GerReferenceCount{) > 2) ) ; 
/ /TRACE (_T("AddReady - cnt %ld\n"). object- 
>GetRef erenceCount ( ) ) , 
return: 

//if (STD: :find(mReadyIoObject->begin() . mReadyloOb j ec t- 
.end( ) . object) 
//return; 

mReadyIoObject.push_back (object) ; 
object->Increment ( ) ; 
object-> AddNotifyO; 
//TRACE)_T("QS-ARO:%ld\n") / object ); 
void SyQuickScheduler: :RemoveReadyObject (SyloObject" object) 

If ( (*Iter)->IoObject() object) 
) 

else 

iter++; 

) 

) • ■ • . 

UInt32 SyquickScheduler: : Iterate (Kntl6 index) 

( 

UInt32 start = SyTimeGetTime ( ) : 
// 

//get the file system that io's will be done on by 

//quering the io object that was signaled 

// 
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ASSERT (mIoObject. sizeO this->NuinObjects ( ) ) : 
#ifdef _DEBUG 

if (index != WAIT_TIMEOUT) 
( 

ASSERT (.index. =0 index, mloObject . size { ) ) : 

^ASSERT(this">WaitObjects() [index] == mIoObject [index]- 
>Event 0 p ; 

) . 

#endif 

// Don't need to get the mObjAccess critical section sin< 
AddObject 

// and RemoveObject stop the scheduler. 
^yOFS* pQFS = NULL; 
if (index != WAIT_TIMEOUT) 
( 

pQFS =inlo0bject [index] ->FileSystera() ; 
) 

else if( imReadyloObject .empty 0 ) 
( 

mReadyOb j Acces s . Enter () : 

pQFS = mReadyloObject. front ()->FileSystem() : 
mReadyOb j Acces s . Leave () : 

) 

/ /TRACE {_T("qsch-index: %ld, readysize : %ld. pQFS: %ld, 
priority: %ld\n"), index, mReadyIoObject->size ( ) , 
pQFS, GetThr€adPriority(this->Thread() ) ); 
//if (index < mlo0bject->size { ) ) 

/ /TRACE {.TC'qsch- iter eventsig: %ld\n"), (mIoObject) 
[ index] ->Event ( ) ) ; 

^ASSERTO (pQFS ==NULL && index WAIT^TIMEOUT) ) ; 

if ( pQFS 1= NULL) 

( 

// ' 
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//copy the current list to a temp list for those items that 
//match the designated filesystem. Protect access to list 
// 

//NOTE: The following code assumes that there is one ■ 
iler 

// per disl volume. If there is not, the code below this 



will 

// assert. 

ReadyloObjects tempList; 
mReadyObj Access . Enter { ) : 

//splice removes all objects from mReadyloObject . 
tempList. splice (tempList.beginO, mReadyloObject); 
mReadyOb jAccessLeave ( ) ; 

#ifdef_DEBUG 

//verify that all objects are really on the same file 
system. 

( 

ReadyToObjects:: iterator iter = tempList . Begin () ; 
for ( ; iter != ten^List.endO ; iter++) 

.ASSERT ((*iter)->Filesystem() == pQFS) ; 

) 

#endif 



// . 

//iterate over the temp list and call BuildloList for all 
objects in 

• //the temp list, if the temp list is empty, then reset the 
event that 

//triggered the scheduler (prevents endless reentry) 
// 



ToNodeList readyloList; 



if (tempList. eir^tyO ) 
( 

if (index != WAIT_TIMEOUT) 
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( 

: : SyResetEvent (ml ©Object ( index) ->Event ( ) ) : 

) 

) 

else 
( 

ReadyloObjects: : iterator iter: 

for(iter = TempList.begin{) : iter != tempList .end( ) ; iter++) 
( 

/ /TRACE {^TC-QS-BIOL %ld\n" ) / (*iter) 
(*iter)->BuildIoList (readyloList) : 
) 
) 

// Sort the list by volume offset 

if ( !readyIoList.empty() ) 

( 

this->SortReadyList (readyloList) ; 
loNodeList: : iterator i = readyloList .begin () ; 
for {; 1 != readyloList.endO ; i++) / 
pQFS->PerformAsynchronousIO{ (♦i) .get () ) ; 

) 

) 

UInt32 last = SyTimeGetTime ( ) - start; 

this-.SetTimeout( ( this-> periodO > last) ? this->period( ) - last 
: 0, FALSE ); 

return D; 
) 

void SyQuic)cScheduler: iSortReadyList (IoNodeList& listToSort) 

/ / Sort the lost bsed on the current value of mSortPorward 
if (mSortPorward) 

listToSort.sort (STD: :greater<SyAutoIoNode () ) ; 

else 

lis tToSort . sort 0 ; 
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// Next time, sort the other way. 
mSort Forward = ! mSort Forward; 



/ NAME gfssched.ini 



// DESCRIPTION 

// Quick version of scheduler 
// 

// COPYRIGHT 

// copyright 1994 by Advanced Digital Systems Group. All right 
reserved. 

// 

// VERSION CONTROL 

// $Header: GrPVCS/ ARCHIVES /hchallel/qfssched.inv 1.4 
Dec 23, 1996 14:21:38 HMD $ . 

// 

# i f ndef „QFSSCHED-INL- 
# de f ine_QFSSCHED- INL_ 

^include "ioobject .hpp" 

. //SyQuickScheduler 



// 



inline UInt32 SyQuickScheduler :: Period ( ) const 



return mPeriod; 



inline UInt32 SyQuickScheduler: : Resolution ( ) const 



inline Timecode SyQuickScheduler : :IoPeriod( ) const 
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return 0x8000; 

) 

// 

/ / Sy I oRe f er enceOb j ec t 
// " 

inline SyloReferenceObject : :SyIoReferenceObject 
( 

Sylobject* ioObject 

) 

; SyRef erenceCount ( ) , 
mloObject (ioObject) 

( 
) 

inline void SyloRef erenceObject : :BuildIoList 
'( 

loNodeListfic ioList 

) 
{ 

// 

// assign member variable locally because Decrement () has 
// the potential to delete 'this'. Even through 'this' is 

passed 

// into BuildloList ( ) , it is only used for comparative 
purposes 

/ . ' 

SyloObject* ploObject = mIoObject; 

this->Decrement ( ) ; 

pIoObject->BuildIoList (ioList, this ); 
inline SyloObject* SyloReferenceObject: : IoObject () const 
( 

return mIoObject; 

) . 

inline SyQFS*SyIoReferenceObject : :FileSystem (void) 
( 
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return (mIoObject ! =NULL /*&& this->GetRef erenceCount ( ) > 
1*/ ) ? mIoObject->FileSystem{)- : NULL; 
. ) ' 

inline void SyloReferenceObject : iRemoveloObject ( ) 
. ( 

mloObject = NULL; 

) 

#endif 
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CLAIMS 

What is claimed is: 

1. A method of executing disk access requests issued by an 
application program, the method comprising the steps of: 

receiving a plurality of disk read /write requests from said 
application program; 

collecting a set of said plurality of read /write requests until a 
predetermined threshold condition is reached; 

sorting the order of said set of read /write requests in relation to 
the physical distribution on a disk of sectors to be accessed by each 
read/write request of said set of read/write requests; and 

transmitting said read/write requests to a drive controller 
controlling said disk in the order created in said sorting step. 

2. A method according to claim 1 wherein said set of 
read/write requests are stored in a buffer memory coupled to said drive 
controller. 

3. A method according to claim 2 wherein said collecting 
step further comprises blocking said plurality of read /write requests 
from being transmitted from said application to said drive controller 
for a predetermined period of time. 

4. A method according to claim 2 wherein said collecting 
step further comprises blocking said plurality of read /write requests 
from being transmitted from said application to said drive controller 
until a predetermined number of read /write requests is accumulated in 
said buffer. 

5. A method according to claim 3 wherein said sorting step is 
performed on said set of read/write requests upon the collection of an 
entire set of read /write requests within said time period. 



6. 



A method according to claim 3 wherein said sorting step i 
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perfonned on each read/write request as it is received into said buffer. 

7. An apparatus for optimizing the order of disk access 
command's issued by an application program, said apparatus 
comprising: 

a buffer receiving a plurality of read /write requests issued by said 
application program, said buffer receiving said plurality of read/write 
requests until a threshold condition is reached, said read/write requests 
received when said threshold condition is reached defining a set of 
read /write requests; 

a scheduler logically coupled to said buffer, said scheduler 
configured to sort said set of read/write requests wherein the order of 
said set of read/write requests corresponds to the physical distribution 
on a disk of sectors to be accessed by each read/write request of said set 
of read /write requests; and 

a drive controller logically coupled to said scheduler for 
controlling access to said disk in accordance with said set of read /write 
requests. 



8. An apparatus according to claim 7 wherein said scheduler 
blocks said plurality of read/write requests from being transmitted from 
said application to said drive controller for a predetermined period of 
time. 

9. An apparatus according to claim 7 wherein said coUecting 
step further comprises a counter, said counter maintaining a count of 
read /write requests received by said buffer, said scheduler configured to 
block said plurality of read /write requests from being transmitted from 
said application to said drive controller until said counter reaches a 
predetermined number of requests. 

10. An apparatus according to claim 8 wherein said scheduler 
sorts said set of read/write requests upon the collection of an entire set 
of read / write requests within said period of time. 



An apparatus according to claim 8 wherein said sorting 
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step is performed on each read /write request as it is received into said 
buffer. 

12. An apparatus for executing disk access requests issued by 
an application program, the method comprising the steps of: 

means for receiving a plurality of disk read /write requests from 
said application program; 

means for collecting a set of said plurality of read/write requests 
until a predetermined threshold condition is reached; 

means for sorting the order of said set of read/write requests in 
relation to the physical distribution on a disk of sectors to be accessed by 
each read/write request of said set of read/write requests; and 

means for transmitting said read/write requests to a drive 
controller controlling said disk in the order created in said sorting step. 

13. An apparatus according to claim 12 further comprising 
buffer means for storing said set of read /write requests. 

14. . A computer readable medium having stored thereon 
sequences of instructions which are executable by a processor, and 
which, which, when executed by the processor, cause the processor to 
perform the steps of: 

receiving a plurality of disk read /write requests from an 
application program; 

collecting a set of said plurality of read /write requests imtil a 
predetermined threshold condition is reached; 

storing said set of read /write requests in a buffer memory; 

sorting the order of said set of read /write requests in relation to 
the physical distribution on a disk of sectors to be accessed by each 
read/write request of said set of read/write requests; and 

transmitting said read /write requests to a drive controller 
controlling said disk in the order created in said sorting step. 

15. A computer according to claim 14 wherein the memory 
further contains instructions which cause the processor to perform the 
step of blocking said plurality of read /write requests from being 
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transmitted from said application to said drive controller for a 
predetermined period of time, 

16. A computer according to claim 14 wherein the memory 
jpurther contains instructions which cause the processor to perform the 
step of blocking said plurality of read /write requests from being 
transmitted from said application to said drive controller until a 
predetermined number of read /write requests is accumulated in said 
buffer. 
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COLLECT DISK READ/WRITE REQUESTS 

FOR ONE SCHEDULER PERIOD 
402 



RE-ORDER THE CURRENT SET OF READ/WRITE 
REQUESTS FOR OPTIMAL DISK DRIVE PERFORMANCE 

. m 



TRANSMIT THE CURRENT SET OF READ/WRITE 
REQUESTS TO THE DISK DRIVE SYSTEM IN THE 
OPTIMIZED ORDER 



406 



FIG. 4 
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